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ABSTRACT 



'This' s-iu^y evoXved ' from a practical field situat„ion 
that dictated an -item format change. The item type in question is a 
variety of the multiple^ true-false item, widely used in state and 
municipal di^^il iS^rvice/^exadiiuations. , Items are pi:od"ucedr not by 
combinjrjrg pairis of independent true-false items, but as a means^of . 
salvaging not '^uite adequate fO'Ur-q^ioice? multiple iten^s. The original 
items 'ma^ have no right ^Jit^wer: two', .three, or four rjight answers; or 
o-ne or two ambiguous oy nonplausible responses. Items of this* type 
were developed for use Ln a -national testing program ^^r automotive* « 
mechanics. After review by a *number of test^specialists and^ 
mechanics, it was decided that many items were faulty. A format 
variation was developed which seemed, to be emore clear cut, informal, 
and easier to read and unde^sta^d. The revised items were used in a 
test battery. In^ practical terms, the overall 'effect of changing item 
format was to make the items easier by an amount that would make 

mean percent corfeqt scores higher by l^ss than one percent. The real 
effect was to eliminate protest's about the test questions. (EC) 
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11^' Jrhere has been relatively litJtie investigation o^ the effect that 

changing the format of ^multiple tholee test items has on item difficulty. 
Such studies that have been done have dealt prlkartly with the effects 
o*f violating some of tha ancient principles of item construction, or on 
fhe effect of using thf^c, four, or five answer choices. Dunn, GoldstelUt 
and Berkhouse (1956) Studied effects on difficulty, validity, and 
, reliability of testa by violation of certain item writing principles: 
use of. incomplete statement stemd, avoidance of specific determiners, 
responses o£ equal length, and' consistency of grammar between stem and 
responses. They found that items wit& specific determiners, with longer 
correct choices, and inconsistent grammar clues were easier, but no less 
reliable or valid, than iteajs "correctly" written. 

r McMorriS, Bjown, Snyder, and Pruzek (1972) In a similar study, 
found that faults made the items easier, but that validity ^md reliability 
coefficients were virtually unchanged.^ 

Board . and Whitby (1972)y*fouhd that violating good item writing 
' practices generally made test^ easier, as well as less reliable and 
-valid. Th^se,<-and similar studies, conform to the standard pattern 
of experimental design, ' . 

The study reported here evolved from aypractical field situation 
that dictated an item format change. The it^n type in question ia a 
variety of ^the multiple true-false* item, widely used in state and 
municipal civil, seryicfe examinations^ Items dre produced » not by 
combiiiin^ pairs of Indep^ndeht true-false items, but as a means of 
salvaging not quite"" adequate four-choice multiple choice items. The 
original items may have 90 right answer; two, three, or four right 
answers; or one^or two ambiguous or noh-plausible responses. Conveu- 
^ tionally, the stem and ttfo viable responses are put Into a format 
^ similar to the following: ' , ^ 

Example li On a truck with a vacuum power booster, coo, much 
pedal pressure is needed jto apply tlie brakes. 
What is a. likely cause? ^ 

I. An air Ifeak on the brake pe^al side of 
/ the power booster cylinder 

II, A leak in the diapftragm of, the power brake 
booster ^ 

(A) I only (B) II only (C) either I or 11 (D) neither I 

y nor II y A 
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. Items in tkis format were developed for use III af national testing 
program for automobile tnecjnianics* Raw four-chdfce items were written 
by instructors and service engineers who had had little item writing 
experience. Test specialists revised the items, which were then 
reviewed iy separate panels of Journeymen mechanics and service fengirteers. 
Many items werfe determined to be faultyf having no or multiple answers, 
or other ambiguities. The only feasible procedure was to recast the 
items into the format - shown above. 

'After the items had been usecf .in operational tests, there were 
protests from both mechanics and their employers that the items were 
too formal, too artificial, and too hard to read. ' ^ „ 

^ format variation v/as developed, which seemed to be more clear 
cut, informal, and easier to read" and understand. Item 2, below, is ^ 
an example of this format • 

Example 2. On a "truck with a vacupni power booster, too much 
pedal pressure is needed to apply the brakes. 

Mechanic A says that an air leak on the brake 
pedal side of the'power booster cylinder Could 
be the cause. - , 

Mechanic "B says thatT" a l^ak ia^ the diaphragm 
of the power brake booster ^oul.d be the cause. 

♦ - . 

Who is right? 

(A) A only (B) B^only (C) bcyth A and B (D)^ neit^her 
, ' A nor B ' * , ' 

The revised items were used in the test battery given in < the Spring 
of 197A. ' ^ ' / 

One hundred and ^our items, distributed among seven of the eight 
tests of the battety, j^ere identified as having bfeen used in both formats. 
The item diff^-culty levels were yevi^wed. Fif ty-ej^ght revised items were 
more difficult, -36 wer'e f^sier, and 10 did not ^change in difficulty. 
Expressed in terms of percent correct ,^th^ean, changes in difficulty of 
the modified items we^e as follows: 

Test 1 2 3- 4 5 - '6. . 7 

Z Mean Change r-Q.A -0.1 0.0 +4.7 -1.3 -1.5 -0.1 

For the 104 items, the mean difficulty level change was less than 
+.03Z. 

In practical terms, the/overall effect of changing item format was 

185 . 3 ^ . ' ' 
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to make the test Items easier by an amount that would make mean percent 
correct scores highef^ by less than one percent. The real- effect of the 
format change has Been to almost completely eliminate the protests 
about "those damn confusing, unreadable test questions. " 
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